Visual similarity analysis of Chinese characters and its uses in Japanese OCR

نویسندگان

  • Tao Hong
  • Stephen W. K. Lam
  • Jonathan J. Hull
  • Sargur N. Srihari
چکیده

1!aditi~!lallY, ~_ Chin~se or J_ap~ese Optical Character Reader (OCR) has to representeach character category individually as one or more feature prototypes, or a structural description which is a composition of manually derived components such as radicals. Here we propose a new approach in which various kinds of visual similarities between different Chinese characters are analyzed automatically at the feature level. Using this method, character categories will be related to each other by training on fonts; and character images from a text page can be related to each other based on visual similarities they share. This method provides a way to interpret character images from a text page systematically, instead of a sequence of isolated character recognitions. The use of the method for postprocessing in Japanese text recognition will also be discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey of Telugu Ocr System

Optical character recognition is usually abbreviated as OCR. The object of OCR is automatic reading of optically sensed document text materials to translate human-readable characters into machine-readable codes. Today, reasonably efficient and inexpensive OCR packages are commercially available to recognize printed texts in widely used languages such as English, Chinese, and Japanese. These sys...

متن کامل

Script Identification – A Han & Roman Script Perspective

All Han-based scripts (Chinese, Japanese, and Korean) possess similar visual characteristics. Hence system development for identification of Chinese, Japanese and Korean scripts from a single document page is quite challenging. It is noted that a Han-based document page might also have Roman script in them. A multi-script OCR system dealing with Chinese, Japanese, Korean, and Roman scripts, dem...

متن کامل

Techniques for Highly Accurate Optical Recognition of Handwritten Characters and Their Application to Sixth Chinese National Population Census

Highly accurate optical character recognition (OCR) of handwritten characters is still a challenging task, especially for languages like Chinese and Japanese. To improve the accuracy, we developed four techniques for enhanced recognition: character recognition based on modified linear discriminant analysis (MLDA), subspace-based similar-character discrimination, multi-classifier combination, an...

متن کامل

Mobile Application for Recognition of Japanese Writing System

Abstrakt The objective of this work was to implement and compare various methods which can be used for optical character recognition (OCR) of characters used in the Japanese language and create a mobile application which could recognize characters in an image captured by the camera of a device and present the user with a translation of the words into English. The engine for recognition has been...

متن کامل

Optical Character Recognition

Optical Character Recognition (OCR) is one of the challenging areas of pattern recognition. It gained popularity among the research community due to its vast application potentials. Extensive research has been done on OCR evidenced by a large number of research articles published in the literature during the last few decades. Most of the research works reported in this area are for Roman, Chine...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995